39 research outputs found

    Significant Subgraph Mining with Multiple Testing Correction

    Full text link
    The problem of finding itemsets that are statistically significantly enriched in a class of transactions is complicated by the need to correct for multiple hypothesis testing. Pruning untestable hypotheses was recently proposed as a strategy for this task of significant itemset mining. It was shown to lead to greater statistical power, the discovery of more truly significant itemsets, than the standard Bonferroni correction on real-world datasets. An open question, however, is whether this strategy of excluding untestable hypotheses also leads to greater statistical power in subgraph mining, in which the number of hypotheses is much larger than in itemset mining. Here we answer this question by an empirical investigation on eight popular graph benchmark datasets. We propose a new efficient search strategy, which always returns the same solution as the state-of-the-art approach and is approximately two orders of magnitude faster. Moreover, we exploit the dependence between subgraphs by considering the effective number of tests and thereby further increase the statistical power.Comment: 18 pages, 5 figure, accepted to the 2015 SIAM International Conference on Data Mining (SDM15

    Clustering techniques for base station coordination in a wireless cellular system

    Get PDF
    A lo largo de este Proyecto Fin de Carrera, propondremos mejoras para futuros sistemas de comunicaciones móviles mediante un estudio detallado de la coordinación entre estaciones base en sistemas celulares basados en MIMO. Este proyecto se compone de dos partes fundamentales. Por un lado, nos centraremos en técnicas de procesado de señal para MIMO como filtrado y precodificación lineales en el dominio espacial. Partiendo de los últimos desarrollos en dicho ámbito, se han desarrollado precodificadores de mínimo error cuadrático medio que incluyen restricciones de máxima potencia transmitida por celda. Además, se ha propuesto un concepto novedoso consistente en la introducción de una nueva formulación que, además de minimizar el error cuadrático medio en el interior de cada agrupación de celdas (cluster ), trata de mantener la interferencia entre clusters en niveles suficientemente bajos. Durante la segunda parte, analizaremos el impacto que la agrupación de celdas en clusters, que define qué estaciones base pueden ser coordinadas entre sí , tiene en el rendimiento global del sistema. Se ha estudiado la aplicabilidad de técnicas de agrupamiento dentro del aprendizaje máquina, dando como resultado un conjunto de nuevos algoritmos que han sido desarrollados adaptando algoritmos de agrupamiento de propósito general ya existentes al problema de crear una partición del conjunto de celdas de acuerdo a las condiciones de propagación de señal existentes en el sistema en un determinado instante. Todas nuestras contribuciones se han verificado mediante la simulación de un sistema de comunicaciones móviles basado en modelos de propagación de señal del 3GPP para LTE. De acuerdo a los resultados obtenidos, las técnicas propuestas a lo largo de este proyecto proporcionan un aumento considerable de la media y la mediana de las tasas por usuario respecto a soluciones ya existentes. La idea de introducir la reducción de interferencia entre clusters en la formulación de los precodifiadores MMSE mejora dramáticamente el rendimiento en sistemas celulares MIMO al ser comparados con precodifiadores de Wiener tradicionales. Por otro lado, nuestros algoritmos de agrupamiento dinámico de estaciones base exhiben un notable aumento de las tasas por usuario a la vez que emplean clusters de menor tamaño con respecto a soluciones existentes basadas en particiones estáticas del conjunto de celdas en el sistema. _______________________________________________________________________________________________________________________________In this project, we attempt to provide enhancements for future mobile communications systems by carrying out a throughout study of base-station coordination in cellular MIMO systems. Our work can be divided in two main blocks. During the first part, we focus our attention on linear MIMO signal processing techniques such as linear spatial precoding and linear spatial ltering. Starting from the state-of-the-art in that area of knowledge, we have developed novel MMSE precoders which include per-cell power constraints and a new formulation which, apart from minimizing the intra-cluster MSE, tries to keep inter-cluster interference at low levels. In the second part, we focus on the study of the impact the particular mapping of cells to clusters in the cellular system has on the overall performance of the mobile communication radio access network. The applicability of existing clustering algorithms in the fi eld of machine learning has been studied, resulting in a set of novel algorithms that we developed by adapting existing general-purpose clustering solutions for the problem of dynamically partitioning a set of cells according to the instantaneous signal propagation conditions. All our contributions have been exhaustively tested by simulation of a cellular mobile communication system based on 3GPP signal propagation models for LTE. According to the results obtained, the techniques proposed along this project provide a remarkable increase of both the average and median user rates in the system with respect to previous existing solutions. The inter-cluster interference-awareness we introduced in the formulation of MMSE precoders dramatically increases the performance in cellular coordinated MIMO when comparing it with traditional Wiener precoders. On the other hand, our dynamic base-station clustering has been shown to signi catively enhance the user rates while using smaller clusters that existing solutions based on static partitions of the base-station deployment.Ingeniería de Telecomunicació

    Differentiable Clustering with Perturbed Spanning Forests

    Full text link
    We introduce a differentiable clustering method based on minimum-weight spanning forests, a variant of spanning trees with several connected components. Our method relies on stochastic perturbations of solutions of linear programs, for smoothing and efficient gradient computations. This allows us to include clustering in end-to-end trainable pipelines. We show that our method performs well even in difficult settings, such as datasets with high noise and challenging geometries. We also formulate an ad hoc loss to efficiently learn from partial clustering data using this operation. We demonstrate its performance on several real world datasets for supervised and semi-supervised tasks

    Interference-aware MIMO precoder design with realistic power constraints

    Get PDF
    This proceedins at: 2013 IEEE International Conference on Communications Workshops (ICC) took place 2013 June 9-13 in Budapest, Hungary.In this work an interference-aware precoder design is proposed for a downlink wireless cellular system. Each base-station designs a precoder with a joint MMSE-ZF criteria for the user information and the interference to other cells. In a realistic power constraint scenario, where each base-station has a limitation on the maximum power available power to be transmitted, the precoder filter can be analytically solved and this solution is provided. The simulated performance of the interference-aware filter in terms of achievable rates and MSE shows some advantages compared to other solutions in the literature designed with the aim of full interference cancellation such as block diagonalization schemes.This work has been partly funded by projects GRE3N (TEC2011-29006-C03-01/02/03) and COMONSENS (CSD 2008-00010).Publicad

    Genome-wide detection of intervals of genetic heterogeneity associated with complex traits

    Get PDF
    Motivation: Genetic heterogeneity, the fact that several sequence variants give rise to the same phenotype, is a phenomenon that is of the utmost interest in the analysis of complex phenotypes. Current approaches for finding regions in the genome that exhibit genetic heterogeneity suffer from at least one of two shortcomings: (i) they require the definition of an exact interval in the genome that is to be tested for genetic heterogeneity, potentially missing intervals of high relevance, or (ii) they suffer from an enormous multiple hypothesis testing problem due to the large number of potential candidate intervals being tested, which results in either many false positives or a lack of power to detect true intervals. Results: Here, we present an approach that overcomes both problems: it allows one to automatically find all contiguous sequences of single nucleotide polymorphisms in the genome that are jointly associated with the phenotype. It also solves both the inherent computational efficiency problem and the statistical problem of multiple hypothesis testing, which are both caused by the huge number of candidate intervals. We demonstrate on Arabidopsis thaliana genome-wide association study data that our approach can discover regions that exhibit genetic heterogeneity and would be missed by single-locus mapping. Conclusions: Our novel approach can contribute to the genome-wide discovery of intervals that are involved in the genetic heterogeneity underlying complex phenotypes. Availability and implementation: The code can be obtained at: http://www.bsse.ethz.ch/mlcb/research/bioinformatics-and-computational-biology/sis.html. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin

    Efficient and Modular Implicit Differentiation

    Full text link
    Automatic differentiation (autodiff) has revolutionized machine learning. It allows expressing complex computations by composing elementary ones in creative ways and removes the burden of computing their derivatives by hand. More recently, differentiation of optimization problem solutions has attracted widespread attention with applications such as optimization as a layer, and in bi-level problems such as hyper-parameter optimization and meta-learning. However, the formulas for these derivatives often involve case-by-case tedious mathematical derivations. In this paper, we propose a unified, efficient and modular approach for implicit differentiation of optimization problems. In our approach, the user defines (in Python in the case of our implementation) a function FF capturing the optimality conditions of the problem to be differentiated. Once this is done, we leverage autodiff of FF and implicit differentiation to automatically differentiate the optimization problem. Our approach thus combines the benefits of implicit differentiation and autodiff. It is efficient as it can be added on top of any state-of-the-art solver and modular as the optimality condition specification is decoupled from the implicit differentiation mechanism. We show that seemingly simple principles allow to recover many recently proposed implicit differentiation methods and create new ones easily. We demonstrate the ease of formulating and solving bi-level optimization problems using our framework. We also showcase an application to the sensitivity analysis of molecular dynamics.Comment: V2: some corrections and link to softwar

    Hyperoxemia and excess oxygen use in early acute respiratory distress syndrome : Insights from the LUNG SAFE study

    Get PDF
    Publisher Copyright: © 2020 The Author(s). Copyright: Copyright 2020 Elsevier B.V., All rights reserved.Background: Concerns exist regarding the prevalence and impact of unnecessary oxygen use in patients with acute respiratory distress syndrome (ARDS). We examined this issue in patients with ARDS enrolled in the Large observational study to UNderstand the Global impact of Severe Acute respiratory FailurE (LUNG SAFE) study. Methods: In this secondary analysis of the LUNG SAFE study, we wished to determine the prevalence and the outcomes associated with hyperoxemia on day 1, sustained hyperoxemia, and excessive oxygen use in patients with early ARDS. Patients who fulfilled criteria of ARDS on day 1 and day 2 of acute hypoxemic respiratory failure were categorized based on the presence of hyperoxemia (PaO2 > 100 mmHg) on day 1, sustained (i.e., present on day 1 and day 2) hyperoxemia, or excessive oxygen use (FIO2 ≥ 0.60 during hyperoxemia). Results: Of 2005 patients that met the inclusion criteria, 131 (6.5%) were hypoxemic (PaO2 < 55 mmHg), 607 (30%) had hyperoxemia on day 1, and 250 (12%) had sustained hyperoxemia. Excess FIO2 use occurred in 400 (66%) out of 607 patients with hyperoxemia. Excess FIO2 use decreased from day 1 to day 2 of ARDS, with most hyperoxemic patients on day 2 receiving relatively low FIO2. Multivariate analyses found no independent relationship between day 1 hyperoxemia, sustained hyperoxemia, or excess FIO2 use and adverse clinical outcomes. Mortality was 42% in patients with excess FIO2 use, compared to 39% in a propensity-matched sample of normoxemic (PaO2 55-100 mmHg) patients (P = 0.47). Conclusions: Hyperoxemia and excess oxygen use are both prevalent in early ARDS but are most often non-sustained. No relationship was found between hyperoxemia or excessive oxygen use and patient outcome in this cohort. Trial registration: LUNG-SAFE is registered with ClinicalTrials.gov, NCT02010073publishersversionPeer reviewe
    corecore